Chromatin Immunoprecipitation Sequencing ◾ 243
6.3.9 Motif Discovery
The major goal of ChIP-Seq is the determination of the binding sites, where TFs, Poly II, and
histone marks interact with the genomic DNA to control the transcription of genes. Those
sites have sequence patterns that are recognized by the targeted proteins. The genomic
sequence pattern that has such biological activity is called a motif. The motifs are usually
found in the genes’ regulatory regions. Therefore, they are most likely to be found in the
peak enrichment regions. The motif enrichment analysis is used to detect enrichment of
known binding motifs in the regulatory regions of genes. The researchers use motif analy-
sis to detect the binding site patterns of known library of TFs that are believed to regulate
a specific set of genes. Motifs are searched around the ChIP-Seq peaks of a specified win-
dow size. Remember that we have peak enrichment stored in “*peaks.narrowPeak” files.
However, the motif detection programs require FASTA sequence as input. Therefore, we
need to generate FASTA sequences from the BED file. We can create BED files by extract-
ing the first three columns from “*peaks.narrowPeak” files as follows:
mkdir motifs
cut -f 1,2,3 \
macs3output/chip1_peaks.narrowPeak \
> motifs/chip1_peaks.bed
cut -f 1,2,3 \
macs3output/chip2_peaks.narrowPeak \
> motifs/chip2_peaks.bed
cut -f 1,2,3 \
macs3output/chip3_peaks.narrowPeak \
> motifs/chip3_peaks.bed
The above commands create a new directory, “motifs”, and store the new created BED files
in it. We will extract FASTA sequences from each of these three files using bedtools, which
is a collection of programs for manipulation of BED files. On Ubuntu, you can install bed-
tools using “apt-get install bedtools”.
Visit the program website “https://bedtools.readthedocs.io/en/latest/content/installa-
tion.html” for more information.
The “bedtools getfasta” command is used to extract a FASTA file from each BED file.
This command requires the FASTA file of the reference sequence and a bed file as input.
We will use the same reference sequence that we used to generate BAM files.
bedtools getfasta \
-fi ref/hg19.fa \
-bed motifs/chip1_peaks.bed \
-fo motifs/chip1_peaks.fasta
bedtools getfasta \
-fi ref/hg19.fa \
-bed motifs/chip2_peaks.bed \
-fo motifs/chip2_peaks.fasta